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Abstract. It is well known that an arbitrary graphical model of statistical inference 
defined on a tree, i.e. on a graph without loops, is solved exactly and efficiently by 
an iterative Belief Propagation (BP) algorithm convergent to unique minimum of the 
so-called Bcthe free energy functional. For a general graphical model on a loopy graph 
the functional may show multiple minima, the iterative BP algorithm may converge 
to one of the minima or may not converge at all, and the global minimum of the 
Bethe free energy functional is not guaranteed to correspond to the optimal Maximum- 
Likelihood (ML) solution in the zero-temperature limit. However, there are exceptions 
to this general rule, discussed in |12j and [2] in two different contexts, where zero- 
temperature version of the BP algorithm finds ML solution for special models on 
graphs with loops. These two models share a key feature: their ML solutions can be 
found by an efficient Linear Programming (LP) algorithm with a Totally-Uni- Modular 
(TUM) matrix of constraints. Generalizing the two models we consider a class of 
graphical models reducible in the zero temperature limit to LP with TUM constraints. 
Assuming that a gedanken algorithm, g-BP, finding the global minimum of the Bethe 
free energy is available we show that in the limit of zero temperature g-BP outputs the 
ML solution. Our consideration is based on equivalence established between gapless 
Linear Programming (LP) relaxation of the graphical model in the T — > limit and 
respective LP version of the Bethe-Free energy minimization. 
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1. Introduction 

Belief Propagation (BP) is an algorithm finding ML solution or marginal probabilities 
on a graph without loops, a tree. The algorithm was introduced in [8] as an efficient 
heuristic for decoding of sparse (so called graphical) codes and it was independently 
considered in the context of graphical models of artificial intelligence [16]. Originally 
the algorithm was primarily thought of as an iterative procedure. [211 [23] , inspired by 
earher works of [3] and [H] in statistical physics, suggested to use a more fundamental 
notion of the Bethe free energy. Extrema of the Bethe free energy represent fixed points 
of the iterative BP algorithm on graphs with cycles. Equations describing the stationary 
points of the Bethe free energy and the fixed points of the iterative BP are called Belief 
Propagation, or Bethe-Peierls, BP equations. 

The significance of BP, understood as an algorithm looking for a minimum of the 
Bethe free energy, was further elucidated within the framework coined loop calculus 
[5], [6] . It was shown that algorithm finding an extremum of the Bethe Free Energy is not 
just an approximation/heursistics in the loopy case, but it allows explicit reconstruction 
of the exact inference in terms of a series, where each term corresponds to a loop on the 
graph. 

If the graphical model is dense there are many loops, and thus many contributions to 
the loop series. However, not all loops are equal. Thus, considering models characterized 
by the same graph but different factor functions or local weights (exact definitions 
will follow) one expects strong sensitivity of an individual loop contribution (and its 
significance within the loop series) on the factor functions. In this context it is of 
interest to study the following question: are there graphical models, defined on an 
arbitrary graph but with specially tuned factor functions, such that BP provides exact 
inference? 

Positive answer to this question was given, independently and for two different 
models, by [12] and [2]. It was shown in [12] that for a graphical model defined on 
an arbitrary graph in terms of binary variables with pairwise sub-modular interaction 
a properly defined version of BP (linear programming relaxation underlying the tree- 
reweighted method of [20]) yields a globally optimal Maximum Likelihood solution. This 
model is equivalent to the Ferromagnetic Random Field Ising (FRFI) model popular in 
statistical physics of disordered systems, see e.g. [10]. Maximum weight matching 
problem on a bipartite graph was analyzed in [2] and later in [18], where it was shown 
that, in spite of the fact that the underlying graph has many short cycles, an algorithm of 
BP type does converge to correct ML assignment. This consideration was also extended 
to the problem of weighted b-matchings on an arbitrary graph (which is yet another 
problem solvable exactly by BP) in [H [11]. Closely related general results, discussing 
convexified version of Bethe Free energy and an iterative convex-BP scheme converging 
to respective LP, were reported in [22l [9]. 

In this paper we use the Bethe free energy approach of [23] to suggest a 
complementary and unifying explanation to these remarkable, and somehow surprising. 
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Figure 1. Scheme illustrating the sequence of transformations and relations discussed 
in the paper. 



results of [121 [2]. In two subsequent Sections we consider two models, FRFI discussed 
in [12] and a binary model with Totally Uni-Modular (TUM) constraints generalizing 
the weight matching problem considered in [2]. Statistical weights are defined for both 
models in terms of a characteristic temperature, T. Our strategy in dealing with both 
models is illustrated in Fig. [H It consists of the following three steps. 

• Starting from the original setting we first go anti-clockwise, getting an Integer 
Programming (IP) formulation for the ML, T ^ 0, version of the problem. The 
most important feature of the two models is that the LP relaxation of the respective 
IP, shown as LP-A in the Figure, is tight/exact. In both cases this reduction from 
IP to LP is exact due to the Total- Uni-Modularity (TUM) feature of the underlying 
matrix of constraints. 

• Then we return to the original setting and start moving clockwise (see the Figure), 
first to the Bethe free energy formulation of the problem. We call the gedanken 
algorithm, finding global minima of the Bethe free energy, g-BP^In the T — * limit 
the Bethe free energy turns to respective self-energy (the entropy term multiplied by 
temperature is irrelevant) which is a linear functional of beliefs. Thus one gets to an 

I The Bethe free energy is non-convex, therefore funding the global minimum at a finite temperature is 
not necessarily straightforward. Acknowledging importance of the problem, we will not discuss in this 
manuscript plausibility and details of respective iterative algorithm convergent to the global minimum 
of the Bethe free energy. Wc refer interested reader to comprehensive discussion of such iterative 
schemes in the general context in \22\ [9] and for FRFI model and maximum weight matching model in 
[12] and in [21 [18] respectively. 
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LP problem here as well, the one shown as LP-B in the Figure. This transformation 
from g-BP to LP-B is analogous to similar relation between g-BP and LP-decoding 
introduced in the coding theory in [3, [19] . 

• Finally, we show that LP-A and LP-B are identical, thus demonstrating that g-BP 
in the T — s> limit outputs the ML solution. 

Note, that convexity of the Bethe free energy at finite temperature, playing the key role 
in analysis of [I2l [22l [9] , is not a required part our consideration. Moreover, the Bethe 
free energy of the binary model with TUM constraints is generally not convex. 

2. Ferromagnetic Random Field Ising model 

Consider an undirected graph Q, consisting of n vertexes, V = {1, ■ ■ ■ , n} and weighted 
edges, S, with the weight matrix {Jij\i,j = l,---,n) such that whenever the two 
vertexes are connected by an edge, i.e. i E j oi j E i, Jij > 0, and Jij = 0, 
otherwise. It is also useful to introduce the directed version of the graph, Q^, where 
any undirected edge of Q is replaced by two directed edges of Qd, with the weights 
Ji^j = Jj^i = Jij/ 2 respectively. The Ferromagnetic Random Field Ising (FRFI) 
model is defined by the following statistical weight associated with any configuration of 
cr = ((Ti = ihl|z = 1, ■ ■ ■ , n) on V: 



where hi can be positive or negative; T is the temperature; Z is the partition function, 
enforcing the probability normalization condition, 'YliaPi'^) ~ -'-i ^^"^ ~^ J) 

marks an undirected/directed edge of Q/Qd connecting the two neighbors i and j. 

2.1. From FRFI to the Min-Cut Problem 

Maximum Likelihood (ground state) solution of Eq. ([T]) turns into the problem of 
quadratic integer programming 



It is well known that any sub-modular energy function (and the quadratic function in 
Eq. ([2]) is of this type) can be minimized in polynomial time by reducing the task to 
the maximum flow/min-cut problem [ll[TO]. In this Subsection we will reproduce these 
known results. 

To unify linear and quadratic terms in Eq. ([2]), one constructs a new graph, Q'^, 
adding two new nodes to Qd'- source (s) and destination (d), with cr^ = +1 and = — 1 




(1) 
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respectively. The (s)-node is linked to all the nodes of Qd with hi > 0, while any node 
of Qd with hi < is linked to (d). Weights of the newly introduced directed edges of Q'^ 
are 

Js^^ = 2hi, if hi > 0; Ji_d = 2\hi\, ii hi < 0. (3) 
This results in the following version of Eq. ([2]) 



(4) 



V i&Qd'- o'i=±l;o-a=+l;(7d=— 1 

Reduction from quadratic integer programming (jl]) to an integer linear 
programming is the next step. This is achieved via transformation to the edge variables, 

The relations can also be restated 

V(« ^ j), (j ^ e : (TiCTj + o-jCTi = 2 - 4(?7i_j + ?7j_i) (6) 

W{d ^ i),{j ^ t) : aiad = I - 2r]d^i, CsCTj = 1 - 2r]j^s, (7) 

Therefore, taking into account that , Ji^j = Jj^i for any {i — > j), (j — > i) G Qdi 
substituting Eqs. (13|^|7|) into Eq. (jlj) and changing variables from cTj = ±1 to Pi = 
(1 — (Ti)/2 = 0, 1 one arrives at 



2 ^ {v,p} ^ 



V(« j) eQd-. Pi- Pj + Vi^j = 0, 1; 

Ps = 0, Pd=i 

This expression is nothing but the integer programming formulation of the famous min- 
cut problem, calculating the minimum weight cut splitting all the nodes of the directed 
graph into two parts such that the group including the source node has all variables in 
the state while the other group, including the destination node, has all variables in 
the 1 state. 

Any {t],p} configuration which satisfies conditions in Eq. ([8]) requires that either 
rji^j = and rij_^i = 1 or r]i^j = 1 and rij^i = for any pair of directed edges 
~^ j)y (j ^ i) & Qd- This suggests that Eq. ([8]) can be restated in terms of the 
undirected graph G", equivalent to the original Q supplemented by the source and 
destination vertexes and edges with the following positive weights 

Jsi = 2hi, if hi > 0; Jid = 2\hi\, if hi < 0. (9) 



One derives the following undirected version of Eq. ([8 



2 I*?,?} 

(i,i)6G' ' (i,i)6G' 



VzG6;',p, = 0,1; 
V(z,j)G^': Pi -pj + ?7ij = 0,1; 



(10) 
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The min-cut problem fllOl) is solvable in polynomial time. This means, in particular, 
that solution of the Integer Programming Eq. ffTOj) and solution of the respective relaxed 
LP-A, 



Vi eg',l> p., > 0; ' ^^^^ 



V(«,j) eg' : p^ - pj + Vij > 0; 

Ps = 0, Pd=l 

are identical. The tightness of the relaxation is, e.g., discussed in [TS]. (See Ch. 6.1 
and specifically Theorems 6.1,6.2 in [15].) Also, this observation is closely related to the 
fact that the matrix of constraints in the max-fiow problem, which is dual to Eq. (|T0|) . 
is Totally Uni-Modular (TUM), i.e. such that any square minor of the matrix has 
determinant which is 0, +1 or —1. (See e.g. Ch. 13.2 of [15] for discussion of the TUM 
IP/LP problems.) 

2.2. Bethe Free Energy and Belief Propagation for FRFI 

Discussing the FRFI model defined in Eq. ([1]) and following the general heuristic 
approach to the graphical models, suggested in [23], one introduces beliefs, i.e. estimated 
probabilities, for vertexes and edges, bi{ai), bij{ai,aj), related to each other according 
to 

V^ & yjei: bi{ai) = '^bij{ai,aj), (12) 

and also satisfying the obvious normalization condition 

V^: ^6,(a,) = l. (13) 

Then the Bethe free energy functional of the beliefs is defined as 

T = E - TS, = E E Ki^i^ ^j)-Jij(^i'^j ~ E E bii'^i)hicri ,(14:) 

S" = ^ ^ bijipi, af) Xwbijipi, Oj) - X] ^^^^^^ ln&i(<7i)- (15) 

Introducing Lagrangian multipliers associated with the constraints (I12fl3p . one 
defines the Lagrangian functional 

Looking for the stationary point of the Lagrangian over all the parameters (the beliefs 
and the Lagrangian multipliers) will define the Belief Propagation (BP) equations. 
Iterative solution of the BP equations constitutes the celebrated BP algorithm, which 
is often used as an efficient heuristic for estimating marginal probabilities in sparse 
graphical models. 
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2.2.1. Ground State In the T ^ limit the entropy terms in the expression for the 
Bethe free energy in Eqs. f|T^ can be neglected and the task of finding the absolute 
minimum of the Bethe free energy functional turns into minimization of the self-energy, 
E from Eq. (fT4l) . under the set of constraints fll2fl3p . Both the optimization functional 
and the constraints are linear in the beliefs, therefore one gets here the following Linear 
Programming optimization: 

^^j^ I ^hj{<yu(Tj)^<y^<yj-^^hi{ai)hi(Ti 



where it is also assumed that all the beliefs are positive and smaller than or equal 
to unity (as we are looking only for physically sensible solutions of the optimization 
problem) . 

Making the transformation from the original graph Q to its extended version, 
i.e. introducing new edges with weights defined in Eqs. ([9]), and requiring that the spin 
values of the source/destination are fixed to ±1 respectively, i.e. &«(+) = hd{—) = 1 and 
bs{—) = &d(+) = 0, one rewrites Eq. ( |T71) as 



Vi, (i, j) e Q' : bi{cri) = J2a, ^i) ' 



bs( + ) = 1 & bJ-) = 1 



The structure of the optimization functional in Eq. flTH]) suggests to reduce the 
number of variables (beliefs), thus keeping only the edge variables 

fiij = bij{+, -) + 6,,(-, +) = 1 - b,j{+, +) - bij{-, -), (19) 

defined as the probabilities to observe the edge either in the state (+, — ) or in the 
state (—,+). Thus, by construction, 1 > fiij > 0. The fiij variables defined at different 
edges are related to each other through local beliefs, vTj = 6j(— ) = 1 — &i(+), which all 
satisfy, < tt^ < 1. Taking all these observations into account one rewrites Eq. ( fTSl) as 



\/i G Q', \/j E i : Tii — ttj + fiij > 0; 1 > /ijj > ''-^^^ 
\/ieg' : 1 > TTi > 
vr^ = 0, Hd = l 

One finds that, up to an obvious change of variables from /i to 77 and from vr to p, the 
LP-B of Eq. ([20]) is identical to the LP- A ([II]). According to the Theorem 6.1 of [IS], 
solutions of Eq. fl20l) . or Eq. f[TT]) . are integers, V(z, j) G F' fiij,r]ij = 0, 1 and Vz G G' 

Pi,7li = 0, 1. 

Summarizing, it was just shown that as T — ^ the BP solution of the FRFI model, 
understood as the global minimum of the Bethe Free energy, is also the ML solution of 
the model. 
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3. Binary model with Totally Uni-Modular Constraints 

Consider N binary variables combined in the vector cr = (crj = 0,l|z = l,---, A^), and 
associate the following normalized probability to any possible value of the vector 

p(cr) = Z^^ exp ^-T~^ ^ hia-^ JJ 6 J^iCTi, , (21) 

where S{x,y) is one if x = y and it is zero otherwise; a = 1,---,M ; matrix J = 
{Jai = 0, l|i = 1, ■ ■ ■ , A^; a = 1, ■ ■ ■ , M) is Totally Uni-Modular (TUM), i.e. determinant 
of any square minor of the matrix is 0, 1 or —1; the vector m = {ma\a = 1, ■ ■ ■ , M) 
is constructed from positive integers, so that Va: nia < Qa = YliJai- The partition 
function Z is introduced in Eq. ( l2T]) to guarantee normalization, ^f^p{cr) = 1. 

The model Eq. ( 1211) can be viewed as a graphical model defined on the bipartite 
graph consisting of "bits", {i}, and "checks", {a}. Also, there may be other graphical 
interpretations. Thus, for the weighted matching problem, e.g. studied in [2], the 
binary variables in the formulation of Eq. fl2Tl) . are associated with edges of the complete 
bipartite graph. (In this case of the weighted matching, one can show that the resulting 
matrix of constraints is indeed TUM.) 

3.1. Efficient ML solution 

We, first of all, observe that the problem of finding the Maximum Likelihood of Eq. (12T1) 
is equivalent to the following Integer Programming (IP) 



min /i^cTj 



(22) 

Vz : cr,- = 0, 1 



Relaxing the IP to respective LP-A. with cTj = 0, 1 changed to Sj = [0; 1], 



mm 

cr 



(23) 

Vi : < < 1 ^ ^ 

one finds that the relaxation is tight. In other words, the solutions of the IP problem and 
the LP problem are exactly equivalent. This is due to the Theorem (see e.g. Theorem 
13.1 of [15]) stating that if J is TUM and m is integer, then all feasible solutions of the 
LP problem are integer. 

3.2. Bethe Free Energy & BP 

Here we discuss the Bethe free energy/Belief Propagation (BP) approach to the model 
defined in Eq. (12T|) . The Bethe free energy functional is 

F = E-TS, E = Y,h^hi{l), (24) 

i 

S = ^^6c.(crc,)ln6„(cr„) - ^ (g^ - 1) bi{ai)\nbi{ai), 



Exactness of Belief Propagation 



for Some Graphical Models with Loops 



9 



where a vector cTq, = (cri|V? s.t. J^j = 1;^- Jmcrj = m^) defines the set of allowed 
configurations of variables marked by index i associated and consistent with the given 
constrained a. For any given rua the number of such allowed vectors/configurations of 
CTq, is = ma^./{{ma — qaV-Qa-)- As usual, baicTa) and are beliefs (estimations 

for the respective probabilities) associated with the variables and the constraints. The 
two types of beliefs are related to each other via the following compatibility constraints: 

& Wa s.t. Jar = l: bi{ai) = ^ 6«(cr,), (25) 

and one should also impose the normalization constraint 

V t: J2^i{ai) = 1. (26) 

Incorporating the compatibility and normalization constraints in the form of 
Lagrangian multipliers into the variational functional one derives the Lagrangian 

>C = F + ^^^,i((Ti) lh{ai)- J2 ba{(Ta) \ +5^Ai(ai) Ij^hiai)-!]. (27) 

i a3i y (Ta\o-i I i \ (7^ / 

Looking for the stationary points of the Lagrangian with respect to all the beliefs and 
the Lagrangian multipliers, A,/i, one arrives at the Belief Propagation equations for the 
problem. 

3.3. T ^ limit of the Bethe free energy 

In the T — > limit the entropy term in Eq. ( l24l) can be dropped and the problem turns 
into minimization of the LP type 



mm 

{b 



Vz : < bAa.) < 1 



Wi & Wa s.t. J„j = 1 : bi{cri) = J2a-^\a, ba{(Ta) 

V i ■■ Y.aM^i) = 1 

It is straightforward to verify that the beliefs associated with a could be completely 
removed from Eq. (l28l) . and the LP problem can be restated solely in terms of the i- 
related variables, = 6j(l) = 1 — 6j(0). 

Let us illustrate this point on example of a single a constraint with rUa = 2 and 
Qa = 3. Then the set of allowed a-beliefs are 

6,(1, 1,0), 6,(1, 0,1), 6,(0, 1,1), (29) 

and the respective set of relations (l25l) between Pi, j32, Ps associated with the check a 
and the a beliefs are 

A = 6,(1, 1, 0)+6,(l, 0, 1), P2 = 6,(1, 1, 0)+6,(0, 1, 1), P3 = 6,(1, 0, l)+6,(0, 1, 1).(30) 

On the other hand the normalization condition, restated in terms of the a-beliefs ( 129|) . 
is 

6,(1, 1, 0) + 6,(1, 0, 1) + 6,(0, 1, 1) = 1. (31) 
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Summing Eqs. (15D]) and accounting for Eq. fl31l) . one finds 



(32) 



In general, one finds that the relation between P variables associated with an a- 
constraint is 



One derives that Eq. (l28l) reduces to a simpler LP-B problem stated solely in terms 
of the /3 variables 



Furthermore, one observes that, up to re-definition of /3j to Sj, Eq. flM|l is equivalent 
to Eq. ( |T7f) . In other words, we just showed that the T — ^ solution of the BP equations, 
understood as the global minimum of the Bethe free energy, is tight, i.e. it gives exactly 
the ML solution of the binary model ( |2T1) . 

As a side remark, one notes that it is suggestive to start exploration of the Bethe 
Free Energy at finite T from the LP solution discussed above. It might be especially 
useful to initiate BP with the (easy to get) LP solution when the Bethe Free Energy 
optimization at finite T is non-convex. 

4. Summary and Path Forward 

In this work we discussed easy problems when a zero-temperature BP scheme generates 
exact ML result. We argued that this special feature of BP is due to the fact that 
the related LP optimization is tight (i.e. the LP outputs ML solution as well). 
Our consideration was based on the flexibility and convenience provided by the so- 
called Bethe Free Energy formulation, naturally relating BP and LP. The results were 
illustrated on two examples, FRFI model and perfect matching model. Also, we briefly 
discussed a broader class of easy examples related to LP with TUM matrix of constraints. 

We conclude briefly mentioning some future challenges which follow from our 
analysis. 

• It is useful to continue further exploration of other models of statistical inference 
with loops allowing computationally efficient optimal solutions. Thus, it would 
be interesting to find examples of "easy" non-binary problems, also these which 
allow efficient and optimal finite temperature evaluation of marginal probabilities 
or partition function. In this context, one mentions exactness of BP marginals 
at any temperature known to hold for continuous variable Gaussian model on an 
arbitrary graph [21], [13] and also recently established, T — > 0, relation between an 
iterative algorithm of BP type and Quadratic optimization problem [T^ . 




(33) 




'a 



(34) 



Exactness of Belief Propagation 



for Some Graphical Models with Loops 



11 



• Probably the most intriguing future challenge is to analyze problems that are not 
computationally easy, but still close, in some metric, to easy problems. Thus, the 
models discussed above, however considered at finite, not zero, temperature may 
not allow an explicit efficient solution. Similarly, perturbation of the FRFI model 
with some number of graph local frustrations (e.g. some number of randomly 
thrown negative Jij violating the TUM-feature of the model) sets another "close to 
easy" problem of theoretical and applied interest. As suggested in [12], BP can be 
utilized as an efficient heuristic in these "close to easy" cases. Note, that in this case 
finding minima of the Bethe Free energy may be a challenge, and the problem turns 
into the quest of devising an efficient algorithms for the optimization of non-convex 
functions [251 [26]. Here novel BP-convexifixation ideas developed in [20 } [T2 | [22l [9] 
might be helpful. Notice also, that the loop calculus approach of [5], [6] is another 
useful tool which may come handy in perturbative and non-perturbative analysis 
of these "close to easy" problems. 

• BP is the algorithm of choice for decoding of error-correction codes stated in terms 
of sparse graphs [8]. On the other hand, the above discussion suggests that for 
BP to decode optimally, or close to optimally, the graphical structure should not 
necessarily be sparse. Therefore, an intriguing question is: can one design a class of 
dense codes decoded optimally (or close to optimally) by an algorithm of BP type? 

The author acknowledges inspiring discussions with V. Chernyak, M. Vergassola, D. 
Shah, B. Shraiman and M. Wainwright. The work was carried out under the auspices of 
the National Nuclear Security Administration of the U.S. Department of Energy at Los 
Alamos National Laboratory under Contract No. DE-AC52-06NA25396. The author 
also acknowledges the Weston Visiting Professorship Program supporting his stay at 
the Weizmann Institute, where the work was completed. 
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